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ABSTRACT 

In this paper we introduce a new type of pattern - a flipping 
correlation pattern. The flipping patterns are obtained from 
contrasting the correlations between items at different lev- 
els of abstraction. They represent surprising correlations, 
both positive and negative, which are specific for a given 
abstraction level, and which "flip" from positive to nega- 
tive and vice versa when items are generalized to a higher 
level of abstraction. We design an efficient algorithm for 
finding flipping correlations, the Flipper algorithm, which 
outperforms naive pattern mining methods by several or- 
ders of magnitude. We apply Flipper to real-life datasets 
and show that the discovered patterns are non-redundant, 
surprising and actionable. Flipper finds strong contrasting 
correlations in itemsets with low-to-medium support, while 
existing techniques cannot handle the pattern discovery in 
this frequency range. 

Categories and Subject Descriptors 

1.5.1 [Pattern Recognition]: Models — Statistical; H.2.8 
[Database Applications]: Data Mining 

General Terms 

Algorithms, Theory, Experimentation 

Keywords 

Flipping Correlation, Itemset Mining 

1. INTRODUCTION 

One of the central tasks in data mining is finding correla- 
tions in binary relations. Often, this is formulated as a mar- 
ket basket problem [1], in which items occurring together are 
organized into a set of transactions (market baskets). The 
central goal of this line of work is to find correlations among 
items based on their recurrent co-appearances among the set 
of transactions. Such correlations represent the similarity of 
the correlated items in respect to their togetherness - e.g., 
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items "bought together" , words "used together" , genes "mu- 
tated together" . They present valuable information, and the 
market basket concept has been successfully applied to var- 
ious domains such as climatology [21], public health [4], and 
bioinformatics [11, 24]. 

Typically, a set of one or more items is called an item- 
set. The number of transactions that contain a particular 
itemset is referred to as the itemset 's support. To find if 
particular items in an itemset are correlated, the support 
of the itemset must be compared with the support of each 
individual item in it. This is a way to determine both posi- 
tive (often appear together) and negative (rarely appear to- 
gether) correlations. Note that mining positive correlations 
is not equivalent to mining frequent itemsets. An itemset 
can be frequent without positive correlation between items, 
and very strong positive correlations can be discovered in 
itemsets with low-to-medium support. 

While very frequent itemsets can be efficiently mined due 
to the anti-monotonicity of support, an efficient algorithm 
for computing positively or negatively correlated items with 
low support is a challenge because most useful correlation 
measures are neither monotonic, nor anti-monotonic. This 
is especially true if both positive and negative correlations 
are of interest: negative correlations imply that we need 
to deal with itemsets with low support. In transactional 
databases where the number of distinct items is large such 
computation remains infeasible [19]. In this work we forsake 
the goal of mining all positive and negative correlations in 
favor of mining a new type of correlation described below. 

In many cases, the transactional data about the relative 
behavior of the items is accompanied by an additional infor- 
mation, based on intrinsic properties of these items. Each 
item may be described with differing amounts of detail at 
different levels of abstraction. For example, whole milk at a 
higher level is simply milk, and bagels can be generalized as 
bread. At the next level, both milk and bread can be gener- 
alized as grocery products, and so on. Each higher level of 
abstraction encompasses a group of several items and hence 
this information can be modeled as a taxonomy tree (is a 
hierarchy). The leaves of a taxonomy tree (or simply tax- 
onomy) represent items at the lowest level of abstraction. 
Each internal node is by itself an object, or an item, but 
at a higher abstraction level. The taxonomy tree is gen- 
erated manually or automatically based on some notion of 
similarity between objects. 

Our goal is to explore correlation differences across ab- 
straction levels in the taxonomy. Specifically, we identify 
a particular type of correlation called flipping correlation, 
in which the correlation value at one level of abstraction 
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Figure 1: Fragment of taxonomy tree for Movies 
dataset: the correlations can be computed between 
specific movies (items), or between general genres 
(their generalizations), if each movie is substituted 
by its genre in the transaction. 

is in contrast with higher-level correlations. That is, the 
correlation "flips" from positive to negative and vice versa. 
Furthermore, in order to avoid the significant costs involved 
with frequent itemset mining, we mine flipping patterns di- 
rectly, based on correlation values. 

As a motivating example, consider the following correla- 
tions extracted from the MovieLens dataset 1 , which contains 
movie rankings, and a hierarchy of movie genres. 



Example 1. To apply the market basket concept to movie 
rankings, we model each user as a single transaction. Each 
transaction contains all movies which this user ranked highly 
(at least 4 out of 5), giving us each user's favorite movies. 
We can easily find correlations between movies, that is, which 
sets of movies are almost always favored together. If we re- 
place each movie by its higher-level abstraction, movie genre, 
then we can find that users who like action movies also like 
adventure movies, but people who like romance movies rarely 
also like westerns (negative correlation). However, for the 
negatively correlated romance and western genres, we found 
two movies, shown in Figure 2(a), which are positively cor- 
related: The Big Country (1958) and High Noon (1952). 
Thus, the positive correlation between these two movies is in 
contrast with negative correlation between their higher-level 
concepts. 

This raises several questions: what is special about these 
two movies? Why do they stand out from other movies of 
the same genres, which tend not to be favored by the same 
users? Here are three potential explanations: 

(1) These are very good movies and the romance-lovers 
who do not generally watch westerns, make an exception for 
High Noon. 

(2) One of the movies was assigned to a wrong genre. 

(3) Despite the fact that these movies belong to different 
genres, they share something which is common to both of 
them, and thus they present a link between two higher-level 
abstractions. 

This is an example of a correlation which flips from neg- 
ative to positive when moving down the branches of the 
taxonomy tree into a more detailed level of abstraction. It 
demonstrates the surprising connection between the objects, 
and sets these objects apart from their siblings, which do not 
have contrasting behavior towards their generalizations. 

Correlations that flip from positive to negative can also 
be valuable, as can be seen with data from the Groceries 
dataset [5] in Figure 2(b). Here a negative correlation be- 
tween eggs and fish is highlighted by the fact that their 
generalizations are highly positively correlated. 

The novelty of the flipping correlation concept is in its con- 
trasting nature. Previously, the taxonomy information was 
used to characterize only positive correlations between items 

1 http: / /www. grouplens.org/ node / 12 
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Figure 2: Sample nipping correlations. 

(in the form of association rules with significantly different 
confidence levels [17]) or in order to rank surprisingness of 
frequent itemsets based on the distance between items in 
the taxonomy tree [6]. Unlike previous studies, we are inter- 
ested in patterns which present sharp flips between positive 
and negative correlations. 

Thus, in this work we address the problem of efficiently 
computing all flipping correlations. In previous works, pat- 
tern pruning or deduplication was mainly performed as a 
post-processing step, after first computing all frequent item- 
sets. Because computing frequent itemsets can be a signifi- 
cant computational challenge, we develop a new method for 
the efficient computation of flipping correlations directly, by 
proposing novel pruning techniques based on new properties 
of the selected correlation measures. Moreover, instead of 
computing all positive and negative correlations and choos- 
ing the flipping among them, we push the contrast ("flip- 
ping") constraint into the mining process, and use it to im- 
prove the efficiency of our algorithm. 

We are solving the following problem: how to find flipping 
correlations without generating all frequent itemsets. This 
task is challenging due to: (1) nipping patterns contain nega- 
tive correlations which by definition are in itemsets with very 
low support, (2) computing all frequent itemsets with very 
low support is computationally prohibitive, and (3) most 
of the correlation measures which can be applied to large 
datasets possess neither monotonicity nor anti-monotonicity 
properties, and as such cannot be straightforwardly used for 
pruning purposes. We solve this challenging problem by de- 
veloping new efficient pruning methods. 

In this work we make the following contributions: 

• We introduce flipping correlation patterns and formalize 
the problem of flipping correlation mining (Section 2). 

• We present new properties of selected correlation mea- 
sures, which allow direct pruning based on correlation 
values instead of support (Section 4.1). Based on these 
properties, we design an efficient solution for finding all 
flipping correlations in transactional databases supplied 
with taxonomy hierarchies (Section 4). The proposed 
solution is applicable to any known correlation measure 
that possesses null(transaction)-invariance, including mea- 
sures that have never been used for pruning before, due 
to the lack of anti-monotonicity. 

• We evaluate the efficiency of new pruning techniques on 
a variety of synthetic and real datasets and demonstrate 
examples of non-trivial flipping correlations which could 
not be discovered using previous techniques (Section 5). 

The rest of the paper is organized as follows. In Section 
2 we formally define the problem of flipping pattern mining 
and discuss the selection of a suitable correlation measure. 
In Section 3 we present and prove new useful properties of 
these correlation measures. In Section 4 we present Flip- 
per, an algorithm for mining flipping correlation patterns. 
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The experimental evaluation of Flipper is described in Sec- 
tion 5. The previous work and how it differs from our study 
is discussed in Section 6. Section 7 concludes this study and 
offers avenues for future research. 

2. PRELIMINARIES 

We start by defining the problem of mining contrasting 
level-specific correlations, which we call flipping correlations. 
The first step is to choose the measure which is most suitable 
for our problem, and the second step is to define positive and 
negative correlations based on this measure. 

2.1 Correlation measure 

Various measures were proposed for assessing the degree 
of the correlation. A comprehensive comparison of 21 differ- 
ent correlation measures was given by Tan et al. [18]. The 
popular correlation measure Lift accompanied by a \ 2 test 
for statistical significance [3] is based on support expectation. 
Other expectation-based measures are (f> [2] and the devia- 
tion from the expected [23]. To compute Lift, the items in 
the transactional database are treated as binary variables, 
and the expected support for an itemset containing both A 
and B is computed as E(sup(AB)) = ■ ■ N, 

where N is the total number of transactions. If sup(AB) > 
E(sup(AB)), then items A and B are positively correlated. 
Similarly, if sup(AB) < E(sup(AB)), then items A and B 
are negatively correlated. The degree of the positive or neg- 
ative correlation is measured by the degree of the deviation 
of the real support from the expected one. 

Unfortunately, the expectation-based measures are unre- 
liable when used for assessing the degree of the correlation 
in large transactional databases [22] . As an illustration con- 
sider the following example. 

Example 2. Consider two sample databases (DB\ and 
DB2) shown in Table 1. These databases differ only by N, 
but not by the number of transactions containing itemsets A, 
B, AB and C , D, CD. One can see that the relationship be- 
tween items A and B and that between C and D can be clas- 
sified either as a positive or as a negative correlation, solely 
depending on the total number of transactions N, instead of 
reflecting the true relationships between the items. For ex- 
ample, C and D, though intuitively a clearly negative corre- 
lation, is judged as positive by the expectation-based correla- 
tion measure in dataset DB\. Thus, expectation-based cor- 
relation measures are unstable and cannot be used to produce 
meaningful positive and negative correlations. 

Since in large databases the number of transactions which 
contain particular item is much smaller than the total num- 
ber of transactions N (small-probability event), the expected 
value for support for both itemsets AB and CD will be ex- 
tremely low in Database DB\ . Then even very small actual 
support will be greater than the expected, and both corre- 
lations will be classified as positive. 

The degree of the expectation-based correlation is highly 
influenced by the number of null transactions [20, 22], i.e., 
transactions which do not contain items whose correlation 
has been measured. Hence, such measures are not suitable 
for the study of correlations in large datasets, where the 
number of null transactions could be large and unstable. 

For our problem of contrasting positive and negative cor- 
relations, it is crucial to adopt a reliable correlation measure 
that is unconcerned with the number of null-transactions 



present in the database. These measures are called null 
(transaction) - invariant [20]. The main property of a null- 
invariant measure is its independence of the total number of 
transactions N . 

According to the study of Wu et al. [22], all five known 
null-invariant measures can be viewed as a generalized mean 
of conditional probabilities. The conditional probabilities 
represent how many transactions containing item Ai also 
contain the rest of the items, and an average over these 
probabilities assesses the degree of the mutual connection 
between items in the itemset. Thus, the degree of this con- 
nection is based solely on the number of relevant transac- 
tions, i.e. the transactions that contain at least one item in 
the itemset to be evaluated. The five measures are summa- 
rized in Table 2. The ordering of the measures for the same 
conditional probabilities follows from the nature of a mean 
which they represent: 

Coherence(A\Ai) < Cosine{A\A2) < Kulc(A\A2) 
harmonic mean < geometric mean < arithmetic mean 

Depending on the support counts of single items, the mea- 
sures produce different results: if sup(Ai) is much larger 
than sup(A2), the Coherence value of such an itemset tends 
to be small, no matter how strong is the relationships be- 
tween the items, while Kulc value will be large if such strong 
relationship exists. Hence, the different correlation measures 
are incomparable, and in order to handle both positive and 
negative correlations, it is bettter to use the same consistent 
correlation measure throughout the entire mining process. 
The discussion about the choice of the most appropriate 
correlation measure can be found in work of Wu et al. [22]. 
Our method can be performed using any null-invariant mea- 
sure. As an illustration, we use Kulczynsky (denoted as 
Kulc) for our experiments. Kulc is a relaxed measure and 
it handles unbalanced itemsets better than Coherence and 
Cosine [22] . Similar to Cosine and Max confidence, mining 
correlations using Kulc represents also a computationally 
challenging case, since these three measures are not anti- 
monotonic. 

Let Corr be one of the null- invariant correlation measures 
from Table 2. We formally define positive and negative cor- 
relations as follows. Recall that an itemset is frequent if its 
support is not less than a minimum support threshold 8 pre- 
defined by a domain expert. For our problem, the minimum 
support threshold 8 can be arbitrarily low. 

Definition 1. Null-invariant correlations. Items in a 
fc-itemset A — {ai, . . . , a^} are positively correlated with a 
correlation measure Corr if A is frequent and Corr(A) > 
7 for a positive correlation threshold 7. Items in A are 
negatively correlated if A is frequent and Corr(A) < e for a 
negative correlation threshold e. 

In our experiments we use Corr(A) — Kulc(A). However, 
in the next section we show that all the techniques developed 
here are applicable to any known null-invariant measure, 
and the efficiency of our new algorithm is not influenced 
by the concrete choice of the correlation measure. Based 
on Definition 1 we formally define the problem of flipping 
correlations mining. 



x This re-definition of Coherence which preserves the fea- 
tures and the ordering of the original Coherence mea- 
sure [22]. 
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Table 1: Examples of the expectation-based correlation. 
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Table 2: Definitions of five null-invariant correlation measures. 



2.2 Flipping pattern 

Let I be a set of items, and let T and T> be two inde- 
pendent sources of the information about these items. The 
taxonomy tree T represents the mapping of the items into 
several levels of abstraction. Each internal node of the tax- 
onomy tree represents a higher-level abstraction for a group 
of items, and is itself an item. The leaves in T represent 
the most specific items, and internal nodes represent more 
general items. The root of T represents all items in X, and 
is considered to be at abstraction level 0. Since there is 
only one node at level 0, we cannot compute correlation for 
a single item, and we exclude the root node from further 
consideration. Let height H of the taxonomy tree T be the 
number of nodes from the top level 1 to the deepest leaf. 
Then, there are H different abstraction levels in the tree, 
and each node belongs to some level. 

While T summarizes the intrinsic similarity relationships 
between items, an additional information about the relative 
behavior of the same items is presented as a set T> of observa- 
tions, or transactions. This is the source of the information 
about the correlation of different items. 

Recall that any combination of k unique items from X 
forms a k-itemset. The support of itemset A = {oi, . . . a*,}, 
sup(A), is the number of transactions containing all items 
from A. In terms of support we measure the correlation 
between items in A as: 



k 

Kulc{A) = \Y: 



sup(A) 



(1) 



k sup(ai) ' 

According to Definition 1, the correlation between items in A 
is positive if Corr(A) is greater than a user-specified positive 
threshold 7, and it is negative if Corr(A) is less than a user- 
specified negative threshold e. If none of these conditions 
holds, the items in A are considered non-correlated, and not 
interesting. For convenience, we call the itemset where the 
items are positively correlated a positive itemset, and where 
the items are negatively correlated a negative itemset. 

The correlation can be computed between different nodes 
of T at the same level of the hierarchy, if we replace the 
items in transactions by their higher-level generalizations. 
An (h, k) -itemset (1 < h < \X\, 1 < h < H) is defined as 
a set of k items from X, replaced by their corresponding 
generalizations from the level h of the taxonomy tree. 

The goal is to find all positive and negative correlations 



between the nodes of taxonomy tree T at the same level 
of abstraction. We are interested only in the level-specific 
correlations of a contrasting nature, i.e., if the correlation 
between nodes is positive, then the correlation between their 
minimal generalizations is negative and vice versa. We say 
that the correlation flips from level to level. 

Definition 2. Flipping pattern. A fc-itemset A repre- 
sents a flipping (correlation) pattern if all (h, fc)-itemsets, 
obtained by replacement of items in A with their corre- 
sponding minimal generalizations, have flipping correlation 
labels. In other words, if an (h, fc)-itemset is positive, then 
an (h + 1, fc)-itemset is negative, and vice versa. 

Since the goal is to find correlations between different 
items at each level of the hierarchy, all items in a flipping 
correlation pattern are descendants of different nodes at hi- 
erarchy level 1. 

By definition 2, a flipping correlation pattern is an itemset 
which has flipping correlations across the entire height H 
of the taxonomy tree. Note that this definition is general 
enough to satisfy any possible user query for contrasting 
level-specific correlations: if the level-specific correlations 
are required for a specific subset of all levels, all that needs 
to be changed is the input to the algorithm, which would be 
a truncated taxonomy tree containing these specific levels of 
interest. 

Since we target the correlations at the same level of ab- 
straction, in case that the depth of some item-leaf node Vi is 
less than H , it is the user's responsibility to define missing 
corresponding generalizations of Vi. In Figure 3 we show 
some possible methods of dealing with such situations. In 
our experiments we rebalanced the tree by adding additional 
copies of V, as its descendants up to depth H (Figure 3 [B] ) . 

To demonstrate that replacing items by their generaliza- 
tions may indeed drastically change the degree of the corre- 
lation consider the following example. 

Example 3. In Figure 4, we show a toy example of 10 
transactions and a taxonomy tree of the corresponding items. 
The input database has 8 different items from 2 different cat- 
egories a and b. Items in each transaction can be substituted 
by their generalizations. Given positive threshold 7 = 0.6 
and negative threshold e = 0.35, we find that there is only 
one itemset, {011,611}, which is a flipping correlation pat- 
tern (Figure 5). m 
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Figure 3: Variants of re-balancing the levels of tax- 
onomy tree: [A] truncate tree by leaving only con- 
sistent levels; [B] consider the copies of leaf nodes 
as their generalizations. 
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Figure 4: A toy example of a taxonomy and a 
database of 10 transactions. 

For mining frequent itemsets at different levels of T, it is 
useful to define different minimum supports for each level, 
because items at lower levels of abstraction are unlikely to 
occur as frequently as those at higher levels. If we use 
low support for the highest hierarchy level, we will end 
up with too many branching itemsets. We assume that 
a set {(?!,..., 6h} of non- increasing support thresholds is 
provided as an input to our algorithm. Though we use 
the support-based pruning in our computation, we do not 
mainly rely on it. Hence, the support thresholds can be set 
arbitrarily low. 

With the above definitions, the flipping pattern min- 
ing problem can be stated as follows: 

Input: A set of transactions T>, a taxonomy tree T, a set of 
thresholds: 7, e, and minimum support Oh for 1 < h < H. 
Output: All flipping correlations satisfying the thresholds. 




Figure 5: An example of a flipping pattern from 
dataset in Figure 4. 



3. NEW PROPERTIES 

Before presenting the solution to the problem of flipping 
correlation mining, we describe and prove useful mathemat- 
ical properties common to all null-invariant correlation mea- 
sures. These properties constitute the basis of an efficient 
flipping pattern mining algorithm, presented in Section 4. 

The anti-monotonicity of support allows to systematically 
control the exponential growth of candidate itemsets. The 
superset of non- frequent itemset A cannot be frequent: if we 
are adding one more item to A, the new item combination 
cannot occur more often than A. In contrast, adding an 
additional item to A may increase or decrease the value of 
the correlation between items in a new itemset: most of null- 
invariant correlation measures, being generalized means, are 
not anti-monotonic. The lack of the anti-monotonicity poses 
a significant challenge, if the efficiency of support-based can- 
didate pruning needs to be enhanced with the correlation- 
based pruning. The two properties of correlation measures 
presented below are intended to overcome this limitation. 

3.1 Correlation upper bound 

The following theorem proves an upper bound of a corre- 
lation value of a superset in terms of correlation values of 
its subsets. It reflects an intuitive observation that corre- 
lation of a superset cannot be positive if all its subsets are 
non-positive. 

Theorem 1. Correlation upper bound 

For k-itemset A and a set S of all A's (k-l)-subitemsets, 
Corr(A) < max sgs(C 'orr(B)) . 

Proof. Let A = {ai, ■ • ■ , a*,}, and B % = A — {ai} be a 
subset of A which contains all elements of A except ai, for 
i = 1, • • ■ , k. 

Because sup(B l ) > sup(A) (anti-monotonic), P(A\aj) < 
P(B l \a-j) for any 1 < j < k. Hence, the theorem trivially 
holds for All confidence and for Max confidence (minimum 
and maximum of conditional probabilities respectively). The 
proof for Coherence, which essentially is sup(A) divided by 
support of all transactions containing any item from A (in- 
tersection over union), is straightforward: the numerator in 
formula of Coherence is non-increasing, and the denomina- 
tor is non-decreasing, while adding one more item to itemset 
B 1 . Only Kulc and Cosine require a special treatement. 

Proof for Kulc. The arithmetic mean of Kulc values of 
all B u s is: 



^(Kulc(B>)) 

i—l 

Here each element 



supiB 1 ) 



1 



1 



k sup(aj) sup{af i ) 
appears in the sum (fc-1) times. 



sup(a^) 

Since sup{B r ) > sup(A), replacing sup(B l ) with sup(A) 
gives the following inequality: 



1 k 

-Y,(Kulc{B*)) 



sup(A) 



1 



1 



sup(a\) 



sup(a k ) 



) = Kulc(A) 



Since the maximum is not smaller than the arithmetic 
mean, we have proven that Kulc(A) < max Bes{Kulc(B)) 

Proof for Cosine. The geometric mean of Cosine values 
for Bi is: 



^/rii Cosine(Bi) 



sup(B 1 ) 



yf sup(a 2 ) X ...sup(a k ) 

Each element 



V 3U P( a i) 



k i/sup(a 1 )x...sup(a k _ 1 ) 

is multiplied (fc-1) times. Since 
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sup{B 1 
gives: 



> sup(A), replacing sup(B l ) with sup(A) again 



^/ili Oosine(Bi) 



sup(A) 



Cosine(A) 



X y / sup(a k ) 



Since the maximum is not smaller than the geometric mean, 
we have proven that Cosine(A) < maxBes(Cosine(B)). 

This completes the proof of Theorem 1 for all five null- 
invariant correlation measures. ■ 

The following corollary follows directly from Theorem 1: 

Corollary 1. If all (k-l)-sub-itemsets of a k-itemset A 
are non-positive, then A cannot be positive. 

3.2 Itemsets with a special single item 

Recall that if we use the measure Corr which is not anti- 
monotonic, we cannot imply that a superset of some itemset 
A is non-positive, even if Corr(A) < j. However, in the fol- 
lowing, we claim that for an item a with special properties, 
knowing correlation values of all (fc-l)-itemsets containing 
this item a allows to evaluate all fc-itemsets containing a. 

Theorem 2. For k-itemset A — {ai, . . . , a k }, and all its 
(k-1 ) subsets of size (k-1 ), which share the same single item 
a, if (1) the correlation values for all these subsets are below 
7 and (2) the support of at least one item ai 7^ a in A 
is greater than or equal to sup(a), then correlation between 
items in A is below 7. 

Proof. Theorem trivially holds for Coherence and All 
confidence, which are anti-monotonic. For anti-monotonic 
measures, if correlation values for all itemsets containing 
item a are below threshold, then none of their supersets 
can be positive. Hence, Theorem 2 holds without condition 
(2). The proofs for Kulc, Cosine and Max Confidence are 
presented below. 

Assume that the first item of A a\ — a and the last item a k 
has the largest support among all single items in A, without 
loss of generality. 

Proof for Kulc. By simple algebra we can show that: 
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fr[ sup(ai) 

< sup(A - {a k }) ^ 

fe — 1 sup(ai) 

= Kulc(A-{a k }) 

< 7, 

where A — {a k } represents a (fe-l)-subset of A which does 
not contain item a k , and, by condition (1), its correlation is 
below the positive threshold as for any of the (fc-l)-itemsets 
containing 01. This proves theorem 2 for Kulc. 
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Figure 6: Search space for nipping correlations. 

Proof for Cosine. By similar simple algebra, we can 
show that 



k y sup(ai) x 
Then 

Cosine(A) - 



< 



X sup(a k _i) < y sup(ai) X ■ ■ • X sup{a k ). 



sup(A) 

y sup(a\) X ■ • • X sup(a k _\) X sup(a k ) 
sup(A) 

k ~y'sup(a 1 ) X ■ ■ ■ X sup(a k _ 1 ) 
sup(A - {a k }) 



k y sup(a\) X ■ ■ ■ X sup(a k _i) 

< C'osine(A — {a k }) 

< 7, 

This completes the proof for Cosine. 

The proof for Max confidence is straightforward: if all 
(fc-l)-itemsets which contain item a± are non-positive, we 
can always represent a superset of any of them as adding 
one more item a k , with support which is maximum between 
all supports of ai. By condition (2) we know that such item 
a k exists and is different from 01. However, the conditional 
probability we are adding as an argument to the max func- 
tion has numerator which is non-increasing (sup(A)), and 
denominator which is the greatest among all supports con- 
sidered for a (fc-l)-subitemset. Hence, we cannot create a 
positively correlated itemset by adding this new item. 

We have proven that Theorem 2 holds for all five null- 
invariant correlation measures. ■ 

The following corollary follows directly from Theorem 2. 

Corollary 2. If the maximum Corr for all k-itemsets 
containing item a is less than 7, and item a has the smallest 
support between single items existing in the database, then 
Corr of all k' -itemsets containing a is less than 7 for fe' > fe. 

Proof. Each (fe + l)-itemset A' which contains a can be 
thought of as an extension of some fc-itemset containing a 
with an item a k+ i, which has the largest support among all 
the items in A' (since we know that support of a is not the 
largest). Then, by Theorem 2, Corr (A') < 7. Since all k- 
itemsets containing item a have Corr value less than 7, all 
(fe + l)-itemsets containing a have Corr value less than 7. 
Iteratively applying Theorem 2, now to extension of (fc + 1)- 
itemsets into (fc+2)-itemsets, containing a, we conclude that 
none of the fe'-itemsets containing a is positive, for k' > k m 

4. FLIPPER ALGORITHM 

This section describes our solution to the flipping pat- 
tern mining problem. The goal is to extract all sequences of 
(h, fc)-itemsets such that (1) each itemset is either positive 
or negative, (2) any two consecutive (h-1, fe)- and (h, fe)- 
itemsets have opposite correlation signs (flip), and (3) the 
flipping sequence continues unbroken from the top to the 
bottom of the hierarchy. Our search space includes all (h, k)- 
itemsets for 1 < h < H and 2 < fc < K, where K is the 
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number of distinct items in the largest possible itemset. We 
model the search space as a two-dimensional table Ml pre- 
sented in Figure 6. Each cell Qh,k of this table contains 
fc-itemsets, where items are substituted by their generaliza- 
tions from hierarchy level h. 

4.1 General framework 

The number of possible itemsets in each cell of table Mi 
grows exponentially from left to right and from the top to 
the bottom. This suggests the top-down and left-to-right 
directions for exploring table M, For convenience, we call 
the extension of (h, fc)-itemsets into (h + 1, fc)-itemsets the 
vertical pattern growth, and the extension of (h, fc)-itemsets 
into (h, k + l)-itemsets the horizontal pattern growth. 

By Definition 2 all items in a flipping pattern are de- 
scendants of the generalizations at level 1 of the taxonomy 
tree, and hence the maximum number of columns in Ml is 
bounded by the total number of different nodes at level 1 
of the taxonomy tree, or by the maximum number of dis- 
tinct items in the same transaction (max transaction width), 
whichever is less. 

For each cell, we first generate the set of candidates, based 
on the information from the previously computed cells to the 
left and above it, and then we perform the support count- 
ing for these candidates, by checking each transaction for 
the corresponding item combination. This framework rep- 
resents a two-dimensional modification of the level-wise fre- 
quent pattern mining known as the Apriori algorithm. The 
efficiency of such a level-wise processing depends on the ef- 
ficiency of pruning: some itemsets in Qh,k can be excluded 
from the set of candidates, if we can infer - from already 
computed values in cells Qh-i,k and Qh,k-i - that they can- 
not be a part of a flipping pattern. The pattern growth 
terminates when for some cell the set of the candidates is 
empty. 

In the following, we describe how we modify this basic 
framework using the definition of a flipping pattern and the 
correlation properties presented in Section 3. These opti- 
mizations lead to a significant reduction of the number of 
candidates in each cell and to the early termination of the 
pattern growth across both dimensions, which accounts for 
the high efficiency of our mining algorithm. 

4.2 Basic pruning 

4.2.1 Pruning by support 

Since by Definition 1 positive and negative correlations 
are computed only between items in frequent itemsets, the 
first pruning and stopping criteria is pruning by support. If 
itemset A is non-frequent it is not extended both horizon- 
tally and vertically, and the combination of items in A is 
excluded from further consideration. A is not extended ver- 
tically, since we want for all itemsets in a flipping sequence 
to be either positive or negative, and A is neither, hence 
it breaks the sequence of flipping correlations. There is no 
need to extend A horizontally as well, since for the same 
support threshold Oh no supersets of A are frequent, and 
hence they break the flipping sequence also in subsequent 
columns of A4. Normally, this anti- monotone property of 
support accounts for an efficient pruning. However, in our 
case, the arbitrarily low support thresholds have an adver- 
sary effect producing an exponentially large number of can- 
didates, which cannot be counted by a single database scan. 
The pattern growth terminates very late, and the pruning- 



by-support has a limited value for flipping correlation min- 
ing. 

4.2.2 Pruning non-flipping itemsets 

Definition 2 of flipping patterns requires that any two con- 
secutive itemsets A p (parent) and A c (child) have opposite 
correlation signs. Thus, if both A p and A c are non-positive 
(or both are non-negative), they break a flipping sequence 
and do not need to be extended vertically. However, a su- 
perset of A c still can be a part of a flipping pattern, since 
we cannot predict the correlation value of each superset of 
A c based only on the correlation of items in A c . Thus we 
need to count all supersets of A p and A c till the end of the 
corresponding rows in table Ml . Hence, we cannot prune su- 
persets of Ac from the candidates until we finish processing 
two consecutive rows. This suggests the row-by-row order of 
processing, since the combination of items in A c should be 
kept anyway to generate corresponding candidate supersets. 
When at least 2 rows are completed, all such non-flipping 
itemsets are excluded from further consideration. Since the 
final pruning of non-flipping itemsets is performed after the 
entire row has been processed, this pruning has a limited 
efficiency as well. 
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Figure 7: (a) Termination of pattern growth if all 
correlations in two vertically consecutive cells are 
negative; and (b) Order of processing for the early 
termination of pattern growth. 

4.3 Advanced pruning 

4.3.1 Early termination of the pattern growth 

Theorem 1 presents an upper bound for a correlation value 
of fe-itemset A in terms of its sub-itemsets. Corrolary 1 
suggests that once correlation values of all (fc-l)-subitemsets 
of A fall below positive correlation threshold 7, we conclude 
that Corr(A) < 7. However, this does not imply that all 
supersets of A are non-positive: there might be a superset 
of A, A' = {ai, . . . , a,k+i}, with Corr(A') > 7, since the 
conditions of Theorem 1 may not hold due to a newly added 
item Ofe+i. However if all (h, fc-l)-itemsets in cell Qh.k are 
non-positive, then they cannot be combined in a positive 
itemset in cell Qh,k+i- 

Now, suppose that all (h, fc)-itemsets in cell Qh,k and all 
(h + 1, fc)-itemsets in cell Qh+i,k are non-positive. Then, ac- 
cording to the flipping-based pruning we can terminate only 
the vertical extension to the next abstraction level. The 
following theorem proves that there are no flipping corre- 
lation patterns also to the right of column k, and thus we 
can terminate both the vertical and the horizontal pattern 
growth. 

Theorem 3. Termination of the pattern growth (TPG) 
If all itemsets in Qh,k and Qh+i,k are non-positive, there 
are no flipping patterns in any column k' for k' > k. 

Proof. For every parent itemset A p in Qh,k+i and child 
itemset A c in Qh+i,k+i, we know that they are non-positive 
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by Corollary 1. By induction, we conclude that any (h,k')- 
itemset and (/i+l,fc')-itemset which are supersets of A v and 
Ac respectively (fc' > fc + 1) are non-positive. Therefore, any 
fc'-itemset with fc' > fc is not a part of a flipping pattern. ■ 

Finding that all itemsets in two subsequent cells in the 
same column are non-positive allows us to terminate the 
pattern growth. For example, if all patterns of Qi,a and 
(?2,3 are negative as illustrated in Figure 7(a), then, based 
on Theorem 3 (TPG principle), we do not need to explore 
any cell below and to the right of cell Q2,a- 

In order to be able to check the termination condition at 
each step of the algorithm, we need to have at hand the re- 
sults for two consecutive cells Qh,k and Qh+i,k in column fc. 
Hence, the row-wise processing is adjusted. We first com- 
pute two upper rows of the search space table by zigzag, as 
illustrated in Figure 7(b): Qi,2 — > Qi,2 — > Qi,3 — > Q2,3 — > 
• • • —¥ Qi,k — > Q2,k until either the TPG termination con- 
dition is satisfied, or all itemsets in some cell are infrequent. 
Then, we process the remaining rows, one at a time. This 
ensures that we always have two cells in subsequent hierar- 
chy levels, to apply the termination principle. 

4.3.2 Pruning single items and their supersets 

Corollary 2 suggests the pruning method for all itemsets 
containing a single item a, which satisfies the following con- 
ditions: (1) item a has the smallest support between single 
items existing in the database, and (2) all fc-itemsets con- 
taining a are non-positive. This pruning is performed as 
follows. 

Let Xh be a complete set of items at abstraction level h. 
The items from each Xh are sorted by support and are kept 
in list Lu- Now, while computing (h, fc)-itemsets in cell Qh,k, 
for each item en in Ch we keep track of the maximum Corr 
value among (h, fc)-itemsets containing a;. As a result, if we 
have that item a\ with the smallest support on the top of C h 
has maximum Corr below 7, we conclude, by Corollary 2, 
that all supersets of a\ in subsequent columns of the search 
space table starting from fc + 1 are non-positive. If we were 
to compute just positive correlations, we would implicitly re- 
move item ai from the database. After removing it, another 
item, CL2 becomes the item with the smallest support, and 
if the above condition holds for item 02, we could remove it 
too. We could continue removing items from the top of Ch, 
until, for some item aj , a positively correlated (h, fc)-itemset 
exists. Now we have a set IZh of j — 1 items, for which we 
know that all their supersets of size more than fc are non- 
positive, and which present candidates for removal from the 
database. 

After computing fc-itemsets in at least two consecutive 
cells Qh,k and Qh+i,k, we have two lists 1Z h and TZh+i of 
single items, whose supersets of size more than fc are non- 
positive. Then, for each item ai from TZh+i, if its higher- level 
abstraction is in IZh, then all supersets of o» are not a part 
of a flipping pattern, and thus, can be pruned. 

We call this pruning method Single-Item Based Pruning 
(SIBP). 

4.4 The complete algorithm 

Algorithm 1 presents the pseudocode of Flipper. At lines 
2-7, we compute two ceiling rows of the search space, count- 
ing itemsets in both cells for each fc simultaneously. We 
apply the TPG principle to terminate the horizontal exten- 
sion as early as possible. At lines 8-15, we compute the rest 
of the search space in the row-wise manner. All fc-itemsets 
which contain single items disqualified by the SIBP principle 



Algorithm 1: The Flipper Algorithm. TPG stands 
for termination of the pattern growth (Theorem 3), and 
SIBP stands for pruning based on a single item (Theorem 
2 and Corollary 2). 

input : a transactional database T> = {D\, D2, D n }, a 
taxonomy tree T, correlation thresholds 7 and e, 
minimum support thresholds Oh for < h < H 

output : all flipping patterns 

1 scan T> and find frequent 1-items for each taxonomy level; 

2 for fc = 2, • • ■ , K do 

3 scan T> to compute Corr for all candidates 
(1, fc)-itemsets and (2, fc)-itemsets; 

4 prune based on support, flipping and SIBP; 

5 if TPG(Q lyk ,Q 2 ,k) then break; 

6 end 

7 eliminate non-flipping patterns in rows 1 and 2; 

8 for h = 3, • • • , H do 

9 for k = 2, • • • , K do 

10 scan T> to compute Corr measure for candidate 
itemsets in Qh,k\ 

11 prune based on support, flipping and SIBP; 

12 if TPG{Q h -i,k,Qh,k) then break; 

13 end 

14 eliminate non-flipping patterns in rows h-1 and h; 

15 end 

16 Check each non-empty Qn,k an d report flipping patterns.; 



are pruned. Also, we apply the termination condition TPG 
to check whether we can terminate the horizontal extension. 

5. EXPERIMENTAL EVALUATION 

We present evaluation of pruning principles described in 
the previous section. The goal of performance experiments 
was to see if the number of candidates to be evaluated drops 
significantly by using proposed pruning techniques, in ad- 
dition to support-based pruning. To assess the pruning 
power of each principle we started from a baseline version - 
the level- wise Apriori algorithm ("BASIC"), and then incre- 
mentally enhanced it with pruning by flipping ( "FLIPPING 
PRUNING"), termination of pattern growth ("TPG"), and 
single- item based pruning ("SIBP"). The BASIC Apriori al- 
gorithm can be regarded as the baseline and represents all 
previous methods, which were computing all frequent pat- 
terns before ranking the correlations by surprisingness [6], 
or before removing the redundancy [12]. We test our new 
methods in the Apriori-like framework due to the simplicity 
of modeling the search space as a two-dimensional table. In 
all experiments, we use Kulc correlation measure, which is 
more tolerant for finding correlations in unbalanced datasets 
[22]. 

All versions perform counting by sequential scans of disk- 
resident input data. Thus, in general, they scale to massive 
inputs. The main memory is used to store the remaining 
candidates. The candidates are pruned after finishing each 
cell Qh,k, in order to keep the usage of the main memory to 
a possible minimum. The experiments were performed on a 
Linux (ver 2.6.18) server with quad core Xeon 5500 proces- 
sors and 48 GB of main memory. The BASIC consumed up 
to 40GB of RAM to store all the candidates, while the en- 
hanced versions never required more than 2GB of memory. 
In experiments with real datasets, we found that in order 
to produce flipping patterns we need to set the minimum 
support threshold very low, which did not allow us to com- 
pare with the "BASIC" Apriori algorithm. For such low 
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Figure 8: Performance for synthetic datasets. 

supports, the exponential explosion of candidate itemsets 
to be kept simultaneously in main memory leads to mem- 
ory overflow and disk thrashing. In contrast, the number 
of remaining candidates after pruning by new methods is 
reasonably small (see Table 4). This demonstrates that, for 
a low-support range, Flipper is significantly more scalable 
than existing support-pruning based mining algorithms. In 
addition, note that in our approach we generate a small sub- 
set of unexpected patterns, rather than the complete pool 
of frequent itemsets. 

5.1 Synthetic datasets 

We studied the influence of different parameters on the 
performance of Flipper. For this, the synthetic datasets 
were generated using generator by Srikant and Agrawal [17]. 
We have set the following default parameter values: num- 
ber of transactions N = \QQK, average number of items 
per transaction (transaction width) W = 5, number of dis- 
tinct items \X\ = 1,000, number of hierarchy levels H = 4. 
The number of distinct categories at the first level is 10, 
the fanout is 5. The default set of thresholds is as follows: 
minimum support thresholds (0i = 1%, 02 = 0.1%, 63 = 
0.05%, 4 = 0.01%) and correlation thresholds (7 = 0.3, e = 
0.1). 

Minimum Support: Because we used 4 minimum sup- 
port thresholds, one for each level of the hierarchy, we made 
a value-decreasing sequence of 10 minimum support thresh- 
old profiles described in Table 3. Profile thrl is a profile 
with high support thresholds for all levels. Starting from 
thr2, we lowered minimum support thresholds for each hi- 
erarchy level one at a time. 

The results are shown in Figure 8(a). For the case of a 
high minimum support (thrl), the running time is low for 
all methods, indicating that pruning based only on support 
(BASIC) works well for high minimum support thresholds. 
However, for lower minimum supports pruning by support 
becomes insufficient. The minimum support threshold at the 
bottom level of hierarchy 64 has the largest impact on the 
performance. We observe a sudden increase in the running 
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time of our baseline method for thr2, thr6 and thrlO, when 
04 is lowered. This is because the largest number of dis- 
tinct items is on the bottom level of the hierarchy. For low 
minimum supports, the total number of frequent patterns 
explodes. Using all of the new pruning techniques together 
makes the computation up to 30 times faster. 

Number of Transactions: In Figure 8(b), we used 5 
different datasets varying N from 100K to 1M. For all meth- 
ods, the running time shows linear dependency on N. With 
all new pruning methods, Flipper runs 15-20 times faster 
than the baseline method. 

Average transaction width: Figure 8(c) shows results 
for 6 different datasets with default parameters, where the 
average transaction width W is increased from 5 to 10. By 
increasing W we get more frequent patterns. For larger W 
we see a dramatic increase in running time for our base- 
line method, while our new techniques handle the increas- 
ing density gracefully. Flipper with full pruning could run 
up to 5, 10, and 300 times faster than FLIPPING+TPG, 
FLIPPING, and BASIC methods respectively. 

Correlation Thresholds: Because we have two param- 
eters (7, e) for correlation thresholds, we used the value- 
increasing sequence of 7 profiles for this experiment. For 
the first 5 profiles we fixed the negative threshold £ as 0.1 
and increased positive threshold values by 0.1, and for the 
rest we fixed the positive threshold 7 as 0.6 and increased 
negative threshold values by 0.2. 

We remind the reader that our advanced pruning is based 
on a non-positivity of candidate patterns. Hence, the ef- 
ficiency of pruning grows when 7 becomes larger and the 
number of positive itemsets drops. Figure 8(d) shows the 
corresponding result: the larger is 7, the more candidates 
are pruned by all 3 pruning methods, and the faster is the 
computation. Note that the baseline method does not de- 
pend on correlation thresholds, since it generates all frequent 
itemsets and disregards the correlation values. 

The general conclusion from these experiments is that if 
we want to obtain correlations in itemsets with low supports 
in dense transactional databases, using the baseline Apriori 
algorithm is computationally infeasible, and the new prun- 
ing methods are quite useful for this scenario. 

Based on this performance evaluation, we may suggest 
the following guidance for parameter settings. First, differ- 
ent support thresholds should be set for each level of the 
hierarchy. The best strategy is to set support thresholds 
comparatively high at the upper levels, and then lower them 
to the more detailed level. The support for the bottom level 
should be set considerably low, otherwise all the itemsets 
could be pruned. Such low level of support was unattain- 
able by previous methods, due to the enormous number of 
candidates which need to be considered. Second, the data 
expert should set the positive correlation threshold 7. The 
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Figure 9: Comparative performance for real 
datasets. 

efficiency of Flipper is due to the pruning of non-positive 
itemsets, so the main performance factor is the careful choice 
of 7. Then the user may start from setting the negative 
threshold just below 7, and gradually decrease it until the 
satisfactory number of flipping patterns is obtained. 

5.2 Reality check 

We applied the market-basket concept to the following 
real-life datasets: 

The GROCERIES dataset [5] 2 represents 1-month of 
the point-of-sale transactions in a local grocery store. The 
taxonomy of items is provided and it represents item cat- 
egorization used in this store. The dataset contains 9, 800 
transactions, and the taxonomy has three levels of abstrac- 
tion. 

The CENSUS dataset [10] 3 is an extract from the US 
Census 1996. It represents multi-attribute records, where 
each record characterizes a single person. Income attribute 
is discretized into two bins: income > $50K/yr or < $50K/yr. 
We considered each record as a transaction. We manually 
created hierarchies with two and three levels based on dif- 
ferent attribute combinations. For example, for the group 
of people occupation: executive and sex: women, the higher 
level generalization is all people with occupation: executive. 
Then the flipping pattern would be to find that occupa- 
tion: executive is strongly positively correlated with income 
> $50K/yr and that this correlation becomes negative for 
the sub-population women executives. This dataset contains 
32, 000 transactions. 

The MEDLINE dataset is a set of the medical paper ci- 
tations. Each citation (paper) is a transaction. The items 
are the topics. The hierarchy of topics was obtained from 
the Medical Subject Headings database (MeSH) 4 . This hier- 
archical terminology was used to manually index each article 
in MEDLINE database. Each paper contains one or several 
categorical topics assigned to it. Our working set contains 
all medical papers published in year 2010 5 (640, 000 cita- 
tions), and we consider only three top levels of the detailed 
hierarchy tree 6 . 

In Figure 9(a), we show the performance results for the 
naive flipping-based pruning vs. full Flipper with our two 
new pruning techniques. We exclude the baseline Apriori 
method, which runs longer than 10 hours even for the small- 
est dataset GROCERIES. Figure 9(b) shows the memory 
consumption by these two programs. Again, the baseline 



2 http: / /rss.acs.unt .edu/Rdoc/library / arules / data/ 
3 ht t p : / / archive .ics.uci.edu/ml/datasets/Adult 
4 http: / /www.nlm. nih.gov/mesh/ 
5 http://mbr. nlm.nih.gov/Download/index.shtml 
6 http: //www.nlm. nih.gov/mesh/2010/mesh_browser 



Apriori method is not included because it requires more than 
48 GB to store all frequent candidates, more memory than 
was available to us. Note that the space was used to store all 
candidate itemsets with their counts, and still our full ver- 
sion never required more than 2 GB of RAM. This points 
out that mining correlations directly produces a small subset 
of candidates, which can be efficiently handled by a modern 
machine. It is possible to further reduce memory consump- 
tion by exploring various optimization methods: for exam- 
ple, for generating candidates in cell Qh,k we only need to 
keep the results for Qh,k-\, and to apply the termination 
of the pattern growth and the single-item based pruning 
we need to simultaneously keep in memory the results only 
for two consecutive cells Qu-i,k an d Qh,k- The candidates 
from the other cells can be eliminated, making the number 
of itemsets in memory reasonably small. This confirms the 
efficiency of our method which generates a small number of 
interesting patterns, without enumerating a large amount of 
candidates. 

The flipping patterns were almost absent from synthetic 
datasets used in our experiments, but for real datasets, we 
could produce a reasonable amount of flipping patterns as 
shown in Table 4. Note that for the low-support profiles 
used in our experiments, the number of flipping patterns 
is substantially lower than the total number of all negative 
and positive patterns. For high support thresholds, the total 
number of positive and negative patterns decreases signifi- 
cantly, but all these patterns are trivial, and none of them 
is flipping. On the other hand, many out of the discov- 
ered flipping patterns are interesting and unexpected. Of 
course, they are contained in the set of all positive and neg- 
ative patterns, however it is much harder to find them there. 
Moreover, for larger datasets the computation of the entire 
set of all negative patterns is infeasible. 



Table 4: Number of flipping patterns vs. all positive 
and negative frequent patterns in datasets GRO- 
CERIES^), CENSUS(C) and MEDLINE(M). 
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In Figures 10,11 and 12 we present a pair of the flipping 
correlation patterns for each dataset. Each example shows 
positive or negative correlation between a pair of items at the 
most detailed level of abstraction (bottom), accompanied 
by the corresponding contrasting correlations between their 
abstractions at higher levels. 
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Figure 10: Flipping correlation in Groceries dataset. 

Flipping in GROCERIES dataset (Figure 10). Pat- 
tern A reflects a famous itemsets {beer, diaper}, now in a 
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more highlighted way: by showing the negative correlation 
between their minimal generalizations. The second exam- 
ple demonstrates that the flipping patterns can be used to 
design more user-friendly store layouts. It happens often 
that customers expect to find some product combinations 
in close proximity while by store design these items belong 
to different and unrelated categories. For example, in Pat- 
tern B, pork and salad dressing are positively correlated, 
while in general pork and delicatessen are negatively cor- 
related. This might suggest removing the salad dressing 
from delicatessen, and moving it closer to the meat depart- 
ment. Many other patterns from this dataset are surprising 
and actionable. For example, the strong negative correla- 
tion between eggs and fish is accompanied by positive cor- 
relation between their higher categories, fresh products and 
meat&fish. The strong positive correlation between baby 
cosmetics and oil is highlighted by the negative correlation 
of such unrelated product categories as cosmetics and oils. 
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Figure 11: Flipping correlation in Census dataset. 

Flipping in CENSUS dataset (Figure 11). These pat- 
terns suggest that Flipper can be used to compare char- 
acteristics of different sub-populations organized into hier- 
archical categories. From Pattern A we learn that edu- 
cation matters: people working in Craft-repair and hav- 
ing Bachelor degree are positively correlated with income 
> $50A'/yr, while their generalization group - all people 
working in Craft-repair are negatively correlated with > 
$50K/yr. Pattern B suggests that it is hard to get 50 
K per year if you are at age group 60-65, unless you are an 
executive. 
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Figure 12: Flipping correlations in MEDLINE 
dataset. 

Flipping in MEDLINE dataset (Figure 12). The sug- 
gestions of new research topic combinations obtained from 
this dataset can be used by researchers in medical field. 
Pattern A: if substance-related mental disorders were of- 
ten studied together with temperance, then it is quite rea- 
sonable to research the combination of the withdrawal syn- 
drome with temperance, underrepresented in current medi- 
cal publications. Pattern B may suggest the collaboration 
between two unrelated areas of psychophysiology and psy- 
chotherapy. However, if one decides to study the combina- 
tion of such sub-topics as biofeedback and behavior therapy, 
he finds out that these two are often studied together. 



To summarize, flipping correlation patterns can be used 
to find items which were incorrectly assigned to the wrong 
category; to find surprising non-trivial correlations to be 
explained; to discover underrepresented, or overrepresented 
combinations of items; or to discover correlations specific 
for some sub-population. All these possible new insights 
into the data become possible with our new approach. 

6. RELATED WORK 

The market basket concept [1] was generalized into a no- 
tion of correlations in a pioneering work of Brin et al. [3]. 
The common approach is to compute correlations between 
items in each frequent itemset, implying that all frequent 
itemsets should be generated first. The best pattern min- 
ing algorithms (e.g., [1, 8]) rely heavily on the suppot-based 
pruning. However, low support thresholds have an adversary 
effect on the efficiency of these algorithms, calling essentially 
for counting all possible combinations of items and leading 
to a very inefficient (and often infeasible) computation. This 
is especially true when the goal is to include negative cor- 
relations, which by definition occur in itemsets with very 
low support ([3, 15, 16, 23, 2, 22]). And so the complete 
set of all negatively correlated items is so far impossible 
to produce directly [20]. An indirect method for comput- 
ing negative correlations [15] is called "support expectation 
based on concept hierarchy". Similar to other expectation- 
based techniques, the correlations estimated by this method 
cannot be reliable because the measure is not null- invariant. 

The focus of the research has shifted from producing a 
complete set of all correlations to discovering only interest- 
ing and non-trivial patterns among the vast number of all 
possible frequent patterns. Multiple interestingness mea- 
sures were proposed, both subjective [13] and objective [19]. 
A comprehensive review of different interestingness mea- 
sures can be found in the book by Hilderman and Hamil- 
ton [9]. The idea of using taxonomies for pruning of re- 
dundant correlations (rules) was first introduced by Srikant 
and Agrawal [17]. Ranking correlations (rules) based on the 
distance between participating items in a given taxonomy 
tree was studied by Hamani and Maamri [6]. This is a par- 
ticular example of a more general approach [13], where a 
rule (positive correlation) is considered interesting, only if 
it contradicts to a rule from a set of pre-defined user be- 
liefs. In this example [6], the user beliefs are presented in a 
form of hierarchical categories, and all the high-confidence 
rules are ranked by surprisingness, which is proportional to 
the number of edges on the shortest path between taxon- 
omy tree nodes (items) . A similar work [7] discusses how to 
mine certain "level-crossing" rules. [14] extends the previ- 
ous works to mine multilevel association rules directly from 
hierarchies. 

To our knowledge, the mining and use of flipping correla- 
tion patterns as highly contrasting level- specific correlations 
has not been proposed or studied before. The use of null- 
invariant correlation measures was discussed recently in the 
context of positive correlations [22]. We extend these mea- 
sures to negative correlations. Though the pruning based 
on a non-antimonotonic null-invariant correlation measure 
(Kulc) was introduced in work of Wu et al. [22], the pro- 
posed pruning turned out to be efficent only for high sup- 
ports and for high correlation thresholds, and thus could not 
be used as a baseline for mining flipping patterns, which 
includes finding correlation below the negative correlation 
threshold. 
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Our mining methodology is novel as well since unlike pre- 
vious works (such as [6]) we directly mine the flipping corre- 
lations thereby eliminating the inefficient frequent itemsets 
mining step. Our results clearly indicate that previous ap- 
proaches, represented by our baseline experiments, are not 
as efficient nor as expressive as the flipping correlation min- 
ing method introduced in this paper. 

7. CONCLUSIONS AND FUTURE WORK 

In this paper, we introduced a new concept of flipping cor- 
relation patterns. We presented an efficient algorithm, Flip- 
per, for mining these new patterns. Despite the fact that the 
selected correlation measure, Kulc, is not anti-monotonic, 
we developed sharp pruning techniques, based on flipping 
constraints and mathematical properties of Kulc. We gen- 
eralized these new techniques to all popular null-invariant 
correlation measures, such as Coherence, Cosine, All Cofi- 
dence and Max Confidence. In our experiments with low 
support thresholds, we demonstrated a high efficiency of 
a new correlation-based pruning compared to the pruning 
based exclusively on support. Using real datasets, we have 
shown that interesting new observations can be extracted 
from the data by using the flipping pattern concept. 

As future work, the flipping pattern concept can be ex- 
tended for discovering a set of discriminative correlations, 
that are specific for a given sub-group. Another challenging 
topic for future research is the choice of threshold values 7 
and e. This is because a data expert might not be able to 
say which correlation value should be considered positive or 
negative in a particular dataset. One of the possible solu- 
tions to this problem is to produce top- if "most flipping" 
patterns, which could be defined as the patterns with the 
largest gap between correlation values at different hierarchy 
levels. Flipper is the first algorithm for level-specific con- 
trasting correlations. It uses new correlation-based pruning 
methods in a simple Apriori-like framework. The use of 
more advanced data structures and more advanced pruning 
methods might also be an area of a fruitful future research. 
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